Skip to content
This repository was archived by the owner on Nov 7, 2025. It is now read-only.

Conversation

@DongLiang-0
Copy link
Contributor

@DongLiang-0 DongLiang-0 commented Jun 11, 2025

A draft PR using quesma to use doris as the data connection source.
This connection source is modified based on clickhouse, and the discover page can be used normally for the kibana_sample_data_logs table.

@DongLiang-0 DongLiang-0 requested a review from a team as a code owner June 11, 2025 10:56
@CLAassistant
Copy link

CLAassistant commented Jun 11, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@jakozaur jakozaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing proof of concept. Thank you for starting it.

Though to merge it to main we would probably need:

  1. Work on new backend connector. Start by copying platform/clickhouse to platform/doris and make changes there, while avoid changes that would break ClickHouse.
  2. That would probably need also some changes to design.
  3. Similarly with SQL function changes. Ideally we should build abstract syntax tree and render the SQL dialects in expr_string_renderer.go
  4. Make sure you will be able to sign CLA.

}

func (p *BasicSqlBackendConnector) Exec(ctx context.Context, query string, args ...interface{}) error {
logger.Info().Msgf("Exec sql: %s", query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three above make a huge sense for debugging. You may also leverage our internal debug console http://localhost:9999.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

"strings"
)

type SchemaTypeAdapter struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeAdapter is already an interface https://github.com/QuesmaOrg/quesma/blob/main/platform/schema/registry.go#L54

I would suggest to implement it and then pass to NewSchemaRegistry https://github.com/QuesmaOrg/quesma/blob/main/cmd/main.go#L94

package clickhouse

import (
"crypto/tls"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to create a new directory apache_doris and implement below functionality there. Then proper initDBConnection can be invoked here https://github.com/QuesmaOrg/quesma/blob/main/cmd/main.go#L87

@DongLiang-0
Copy link
Contributor Author

Hey @pdelewski , the two problems you mentioned above are not difficult.
Now the main problem is another one: all the parsing and transform statements in the framework are for ClickHouse(there are some syntax differences between doris and clickhouse). If we forcefully add Doris-related codes, the code may be confusing.

@pdelewski
Copy link
Contributor

Hey @pdelewski , the two problems you mentioned above are not difficult. Now the main problem is another one: all the parsing and transform statements in the framework are for ClickHouse(there are some syntax differences between doris and clickhouse). If we forcefully add Doris-related codes, the code may be confusing.

@DongLiang-0

Hey @pdelewski , the two problems you mentioned above are not difficult. Now the main problem is another one: all the parsing and transform statements in the framework are for ClickHouse(there are some syntax differences between doris and clickhouse). If we forcefully add Doris-related codes, the code may be confusing.

@DongLiang-0 True, but there are at least few options to solve this problem. I will add some comments in different places.

origFunc := origExprTyped
switch origFunc.Name {
case "sum", "sumOrNull", "min", "minOrNull", "max", "maxOrNull":
case "sum", "sumOrNull", "min", "max":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to remove minOrNull and maxOrNull ?. I think that it should be orthogonal

return model.NewFunction("APPROX_COUNT_DISTINCT", model.NewColumnRef(columnName)), "", nil
}

if strings.HasPrefix(origFunc.Name, "quantiles") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, Do we need to remove that?

}

func (p *pancakeSqlQueryGenerator) generatePartitionBy(groupByColumns []model.AliasedExpr) []model.Expr {
func (p *pancakeSqlQueryGenerator) generatePartitionBy(groupByColumns []model.GroupByExpr, useGroupByColumn bool) []model.Expr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to rename old function to something like generatePartitionByViaAliasExpr and add a new function generatePartitionByViaGroupByExpr

}

func (p *pancakeSqlQueryGenerator) generateMetricSelects(metric *pancakeModelMetricAggregation, groupByColumns []model.AliasedExpr, hasMoreBucketAggregations bool) (addSelectColumns []model.AliasedExpr, err error) {
func (p *pancakeSqlQueryGenerator) generateMetricSelects(metric *pancakeModelMetricAggregation, groupByColumns []model.GroupByExpr, hasMoreBucketAggregations bool) (addSelectColumns []model.AliasedExpr, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. I think we can have two functions

}

func (p *pancakeSqlQueryGenerator) isPartOf(column model.Expr, aliasedColumns []model.AliasedExpr) *model.AliasedExpr {
func (p *pancakeSqlQueryGenerator) isPartOf(column model.Expr, aliasedColumns []model.GroupByExpr) *model.AliasedExpr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

}

func (p *pancakeSqlQueryGenerator) isPartOfOrderBy(alias model.AliasedExpr, orderByColumns []model.OrderByExpr) bool {
func (p *pancakeSqlQueryGenerator) isPartOfOrderBy(alias model.GroupByExpr, orderByColumns []model.OrderByExpr) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

}

func (p *pancakeSqlQueryGenerator) addPotentialParentCount(bucketAggregation *pancakeModelBucketAggregation, groupByColumns []model.AliasedExpr) []model.AliasedExpr {
func (p *pancakeSqlQueryGenerator) addPotentialParentCount(bucketAggregation *pancakeModelBucketAggregation, groupByColumns []model.GroupByExpr) []model.AliasedExpr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

return v.VisitTableRef(t)
}

type GroupByExpr struct {
Copy link
Contributor

@pdelewski pdelewski Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the difference between the new GroupByExpr and the existing AliasExpr types — their contents appear to be the same. Introducing this new type triggers many changes in function signatures, and I'm not sure why it's needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In doris syntax, cannot use an alias after group by. For example, cannot writegroup by `col1` as `a`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, however now GroupByExpr and AliasExpr are the same. Do you plan to change that?


const array = "array"

type functionWithCombinator struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think functionWithCombinator can be parametrized with array prefix

matches := wantedRegex.FindStringSubmatch(source)
if len(matches) == 2 {
return model.NewFunction("toHour", model.NewColumnRef(matches[1])), true
return model.NewFunction("HOUR", model.NewColumnRef(matches[1])), true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be abstract name, that will be later resolved by additional transformation depending on backend

@pdelewski
Copy link
Contributor

Overall, great work — thank you for your contribution! A few comments:

  • I'm not sure about the introduction of GroupByExpr — it seems to duplicate AliasExpr and causes changes in function signatures.

  • Function names used in the parser(s) should remain abstract and be resolved later by a transformation layer that maps logical calls to specific backends.

  • Function name parametrization should happen at the transformation layer, depending on the backend.

  • We should consider introducing a new apache_doris directory to isolate Doris-specific logic.

@DongLiang-0 DongLiang-0 marked this pull request as draft July 1, 2025 03:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants